This workshop the second workshop designed for the Cancer MSc Students in UCL Cancer Institute to gain some confidence on using R (statistical-) programming language in their MSc projects. I would appreciate if you participate in this pre-course survey (once again) so that I know of your expectation from today’s workshop.
After failing twice in my driving practical test, I took some time off from driving lessons. The reason was, partly, financial. It was also the beginning of the shorter days of winter and I was a bit worried of taking the exam during those cold days when the roads in Edinburgh became a bit tricky to drive. In the next summer, when I tried to contact my driving instructor, to my surprise, I came to know that he had changed his profession. Well, my wife still blames me in secret.To be honest with you, it was not the first time that my driving instructor had stopped training people (though my first instructor had some family responsibilities and took a break).
So, I went to the third driving instructor, Bill. He was in his late 70s, I guess, and initially I had hard time understand what he was saying. You may think that it’s not ideal at all for a driving lesson. But interestingly enough, it worked out in the end and I got my driving license this time. Anyway, Bill used to be an engineer and after his retirement, he started his second career as a driving instructor. On the first day, Bill told me to forget everything I had learned so far on driving. I was a bit shocked indeed with his condescending approach, but when he started the lesson it felt like he was teaching me the grammar of driving - how to control the clutch, how to read the mind the driver of an oncoming vehicle etc.
By now, you may have started to wander, what does Bill have anything
to do with you or this workshop? Bear with me. Each time I think about
two R packages, namely dpylr and ggplot2, they
remind me of Bill. In this workshop on exploratory data analysis using
R, we will learn the grammar of data manipulation and grammar
of graphics to draw fancy plots for our exploratory analysis. What you
have learned in the previous workshop, was not even the tip of an
iceberg of plotting with R. Those function that we used were a
bit rigid and you have less control over. But here, with the package
ggplot2, we will shape up the plots as we wish. We will
draw layer upon layer to incorporate so many aspects of the data in a
single plot. And for the data handling, we will use dplyr
package. We will add layer of functions, as we think, to build our data
structure for downstream analysis. And as a whole, we will try to tell a
story with our data and the plots that we will generate today.
dplyrTrust me, this is the part of my research where I spend a significant portion of my time. Real-life data are not polished and nicely annotated. Moreover, when you want to integrate data from different sources, the fun begins (I am showing the quotation finger, of course)! Moreover, you need to format the output from one process and make it worthy for the next one. So, there’s no escape from data formatting / manipulating data in real-life.
Here, we will be using the dplyr package which is one of
the most powerful and popular packages in R. The d
here represents data and plyr is supposed to be the tool
plier. Therefore, dplyr packages refers to a tool to
manipulate data frame. dplyr provides a grammar of data
manipulation and the functions it provides are regarded as the verbs in
the code and are very efficient ones in solving most common data
manipulation problems. It is sometimes arguably more efficient than the
base R operations.
There are mainly two ways to install dplyr package in
R. You can install the tidyverse package
and dplyr, being a part of it, will automatically be
installed in your R environment.
# install.packages("tidyverse")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Or, you can install just the dplyr package by -
# install.packages("dplyr")
library(dplyr)
However, if you want to install the development version, which I won’t recommend, you can follow the codes below -
# if (packageVersion("devtools") < 1.6) {
# install.packages("devtools")
# }
# devtools::install_github("hadley/lazyeval")
# devtools::install_github("hadley/dplyr")
dplyrlibrary(dplyr)
It will be a crime not to introduce the pipe operator
%>% to you before starting with dplyr
verbs. If you are familiar with the pipe operator | in bash
scripting, that’s it. I have no better way to describe it to you. But,
if you are not, then here is the thing for you -
The pipe operator %>% connects two operations on the
same data (be it a vector or a data frame). It passes the output from
the left-hand side operation of the pipe operator as the first argument
to the right-hand side operation of the operator. If you want an
informal definition - x %>% f(y) is converted into
f(x,y) by using the pipe operator. Let’s look at another
example. If we have a vector x that holds value from 1 to 100 and we
want to calculate the mean of x and make it round to an integer, we
write in base R -
x <- 1:100
round(mean(x))
## [1] 50
However, using the pipe operator, we can first define the x and then calculate the mean and, at the end, round it to an integer, like -
x <- 1:100
x %>% mean %>% round
## [1] 50
It goes from left to right as we think and build our data analysis
pipeline. The new version of dplyr also supports
|> as the pipe operator, but I will stick to
%>% in the workshop.
There are many verbs embedded in the dplyr package, here I will be discussing a few, but very important ones, that you will need to resolve most of the data manipulation challenges in your day-to-day life.
select() picks variables based on their names or types.
For example -
library(kableExtra)
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
# using specific variable names -
iris %>% select(Sepal.Length, Sepal.Width) %>% kable(align = "lccrr", caption = "iris data: Sepal length and width") %>% kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>% scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.4 | 3.9 |
| 5.1 | 3.5 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 4.8 | 3.4 |
| 5.0 | 3.0 |
| 5.0 | 3.4 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.7 | 3.2 |
| 4.8 | 3.1 |
| 5.4 | 3.4 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 4.9 | 3.1 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 5.0 | 3.5 |
| 5.1 | 3.8 |
| 4.8 | 3.0 |
| 5.1 | 3.8 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.1 | 2.8 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 5.5 | 2.4 |
| 5.8 | 2.7 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.7 | 3.1 |
| 6.3 | 2.3 |
| 5.6 | 3.0 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 5.8 | 2.7 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 6.4 | 3.2 |
| 6.5 | 3.0 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.0 | 2.2 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.1 | 3.0 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.4 | 2.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.9 | 3.1 |
| 6.7 | 3.1 |
| 6.9 | 3.1 |
| 5.8 | 2.7 |
| 6.8 | 3.2 |
| 6.7 | 3.3 |
| 6.7 | 3.0 |
| 6.3 | 2.5 |
| 6.5 | 3.0 |
| 6.2 | 3.4 |
| 5.9 | 3.0 |
# using type -
iris %>% select(is.numeric) %>%
kable(align = "lccrr",caption = "iris data: neumeric columns only") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Warning: Use of bare predicate functions was deprecated in tidyselect 1.1.0.
## ℹ Please use wrap predicates in `where()` instead.
## # Was:
## data %>% select(is.numeric)
##
## # Now:
## data %>% select(where(is.numeric))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 |
| 4.9 | 3.0 | 1.4 | 0.2 |
| 4.7 | 3.2 | 1.3 | 0.2 |
| 4.6 | 3.1 | 1.5 | 0.2 |
| 5.0 | 3.6 | 1.4 | 0.2 |
| 5.4 | 3.9 | 1.7 | 0.4 |
| 4.6 | 3.4 | 1.4 | 0.3 |
| 5.0 | 3.4 | 1.5 | 0.2 |
| 4.4 | 2.9 | 1.4 | 0.2 |
| 4.9 | 3.1 | 1.5 | 0.1 |
| 5.4 | 3.7 | 1.5 | 0.2 |
| 4.8 | 3.4 | 1.6 | 0.2 |
| 4.8 | 3.0 | 1.4 | 0.1 |
| 4.3 | 3.0 | 1.1 | 0.1 |
| 5.8 | 4.0 | 1.2 | 0.2 |
| 5.7 | 4.4 | 1.5 | 0.4 |
| 5.4 | 3.9 | 1.3 | 0.4 |
| 5.1 | 3.5 | 1.4 | 0.3 |
| 5.7 | 3.8 | 1.7 | 0.3 |
| 5.1 | 3.8 | 1.5 | 0.3 |
| 5.4 | 3.4 | 1.7 | 0.2 |
| 5.1 | 3.7 | 1.5 | 0.4 |
| 4.6 | 3.6 | 1.0 | 0.2 |
| 5.1 | 3.3 | 1.7 | 0.5 |
| 4.8 | 3.4 | 1.9 | 0.2 |
| 5.0 | 3.0 | 1.6 | 0.2 |
| 5.0 | 3.4 | 1.6 | 0.4 |
| 5.2 | 3.5 | 1.5 | 0.2 |
| 5.2 | 3.4 | 1.4 | 0.2 |
| 4.7 | 3.2 | 1.6 | 0.2 |
| 4.8 | 3.1 | 1.6 | 0.2 |
| 5.4 | 3.4 | 1.5 | 0.4 |
| 5.2 | 4.1 | 1.5 | 0.1 |
| 5.5 | 4.2 | 1.4 | 0.2 |
| 4.9 | 3.1 | 1.5 | 0.2 |
| 5.0 | 3.2 | 1.2 | 0.2 |
| 5.5 | 3.5 | 1.3 | 0.2 |
| 4.9 | 3.6 | 1.4 | 0.1 |
| 4.4 | 3.0 | 1.3 | 0.2 |
| 5.1 | 3.4 | 1.5 | 0.2 |
| 5.0 | 3.5 | 1.3 | 0.3 |
| 4.5 | 2.3 | 1.3 | 0.3 |
| 4.4 | 3.2 | 1.3 | 0.2 |
| 5.0 | 3.5 | 1.6 | 0.6 |
| 5.1 | 3.8 | 1.9 | 0.4 |
| 4.8 | 3.0 | 1.4 | 0.3 |
| 5.1 | 3.8 | 1.6 | 0.2 |
| 4.6 | 3.2 | 1.4 | 0.2 |
| 5.3 | 3.7 | 1.5 | 0.2 |
| 5.0 | 3.3 | 1.4 | 0.2 |
| 7.0 | 3.2 | 4.7 | 1.4 |
| 6.4 | 3.2 | 4.5 | 1.5 |
| 6.9 | 3.1 | 4.9 | 1.5 |
| 5.5 | 2.3 | 4.0 | 1.3 |
| 6.5 | 2.8 | 4.6 | 1.5 |
| 5.7 | 2.8 | 4.5 | 1.3 |
| 6.3 | 3.3 | 4.7 | 1.6 |
| 4.9 | 2.4 | 3.3 | 1.0 |
| 6.6 | 2.9 | 4.6 | 1.3 |
| 5.2 | 2.7 | 3.9 | 1.4 |
| 5.0 | 2.0 | 3.5 | 1.0 |
| 5.9 | 3.0 | 4.2 | 1.5 |
| 6.0 | 2.2 | 4.0 | 1.0 |
| 6.1 | 2.9 | 4.7 | 1.4 |
| 5.6 | 2.9 | 3.6 | 1.3 |
| 6.7 | 3.1 | 4.4 | 1.4 |
| 5.6 | 3.0 | 4.5 | 1.5 |
| 5.8 | 2.7 | 4.1 | 1.0 |
| 6.2 | 2.2 | 4.5 | 1.5 |
| 5.6 | 2.5 | 3.9 | 1.1 |
| 5.9 | 3.2 | 4.8 | 1.8 |
| 6.1 | 2.8 | 4.0 | 1.3 |
| 6.3 | 2.5 | 4.9 | 1.5 |
| 6.1 | 2.8 | 4.7 | 1.2 |
| 6.4 | 2.9 | 4.3 | 1.3 |
| 6.6 | 3.0 | 4.4 | 1.4 |
| 6.8 | 2.8 | 4.8 | 1.4 |
| 6.7 | 3.0 | 5.0 | 1.7 |
| 6.0 | 2.9 | 4.5 | 1.5 |
| 5.7 | 2.6 | 3.5 | 1.0 |
| 5.5 | 2.4 | 3.8 | 1.1 |
| 5.5 | 2.4 | 3.7 | 1.0 |
| 5.8 | 2.7 | 3.9 | 1.2 |
| 6.0 | 2.7 | 5.1 | 1.6 |
| 5.4 | 3.0 | 4.5 | 1.5 |
| 6.0 | 3.4 | 4.5 | 1.6 |
| 6.7 | 3.1 | 4.7 | 1.5 |
| 6.3 | 2.3 | 4.4 | 1.3 |
| 5.6 | 3.0 | 4.1 | 1.3 |
| 5.5 | 2.5 | 4.0 | 1.3 |
| 5.5 | 2.6 | 4.4 | 1.2 |
| 6.1 | 3.0 | 4.6 | 1.4 |
| 5.8 | 2.6 | 4.0 | 1.2 |
| 5.0 | 2.3 | 3.3 | 1.0 |
| 5.6 | 2.7 | 4.2 | 1.3 |
| 5.7 | 3.0 | 4.2 | 1.2 |
| 5.7 | 2.9 | 4.2 | 1.3 |
| 6.2 | 2.9 | 4.3 | 1.3 |
| 5.1 | 2.5 | 3.0 | 1.1 |
| 5.7 | 2.8 | 4.1 | 1.3 |
| 6.3 | 3.3 | 6.0 | 2.5 |
| 5.8 | 2.7 | 5.1 | 1.9 |
| 7.1 | 3.0 | 5.9 | 2.1 |
| 6.3 | 2.9 | 5.6 | 1.8 |
| 6.5 | 3.0 | 5.8 | 2.2 |
| 7.6 | 3.0 | 6.6 | 2.1 |
| 4.9 | 2.5 | 4.5 | 1.7 |
| 7.3 | 2.9 | 6.3 | 1.8 |
| 6.7 | 2.5 | 5.8 | 1.8 |
| 7.2 | 3.6 | 6.1 | 2.5 |
| 6.5 | 3.2 | 5.1 | 2.0 |
| 6.4 | 2.7 | 5.3 | 1.9 |
| 6.8 | 3.0 | 5.5 | 2.1 |
| 5.7 | 2.5 | 5.0 | 2.0 |
| 5.8 | 2.8 | 5.1 | 2.4 |
| 6.4 | 3.2 | 5.3 | 2.3 |
| 6.5 | 3.0 | 5.5 | 1.8 |
| 7.7 | 3.8 | 6.7 | 2.2 |
| 7.7 | 2.6 | 6.9 | 2.3 |
| 6.0 | 2.2 | 5.0 | 1.5 |
| 6.9 | 3.2 | 5.7 | 2.3 |
| 5.6 | 2.8 | 4.9 | 2.0 |
| 7.7 | 2.8 | 6.7 | 2.0 |
| 6.3 | 2.7 | 4.9 | 1.8 |
| 6.7 | 3.3 | 5.7 | 2.1 |
| 7.2 | 3.2 | 6.0 | 1.8 |
| 6.2 | 2.8 | 4.8 | 1.8 |
| 6.1 | 3.0 | 4.9 | 1.8 |
| 6.4 | 2.8 | 5.6 | 2.1 |
| 7.2 | 3.0 | 5.8 | 1.6 |
| 7.4 | 2.8 | 6.1 | 1.9 |
| 7.9 | 3.8 | 6.4 | 2.0 |
| 6.4 | 2.8 | 5.6 | 2.2 |
| 6.3 | 2.8 | 5.1 | 1.5 |
| 6.1 | 2.6 | 5.6 | 1.4 |
| 7.7 | 3.0 | 6.1 | 2.3 |
| 6.3 | 3.4 | 5.6 | 2.4 |
| 6.4 | 3.1 | 5.5 | 1.8 |
| 6.0 | 3.0 | 4.8 | 1.8 |
| 6.9 | 3.1 | 5.4 | 2.1 |
| 6.7 | 3.1 | 5.6 | 2.4 |
| 6.9 | 3.1 | 5.1 | 2.3 |
| 5.8 | 2.7 | 5.1 | 1.9 |
| 6.8 | 3.2 | 5.9 | 2.3 |
| 6.7 | 3.3 | 5.7 | 2.5 |
| 6.7 | 3.0 | 5.2 | 2.3 |
| 6.3 | 2.5 | 5.0 | 1.9 |
| 6.5 | 3.0 | 5.2 | 2.0 |
| 6.2 | 3.4 | 5.4 | 2.3 |
| 5.9 | 3.0 | 5.1 | 1.8 |
With the verb select(), comes some selection
helpers -
If you want to select all the variables, you can use
everything()
iris %>% select(everything()) %>%
kable(align = "lccrr",caption = "iris data: everything") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
You can choose the last column using last_col() or only
columns that are grouped using group_cols() (You will
understand better when I discuss the group_by() verb
later).
iris %>% select(last_col()) %>%
kable(align = "lccrr",caption = "iris data: last_col()") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Species |
|---|
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
iris %>% group_by(Sepal.Length,Sepal.Width) %>% select(group_cols()) %>%
kable(align = "lccrr",caption = "iris data: select grouped columns") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.4 | 3.9 |
| 5.1 | 3.5 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 4.8 | 3.4 |
| 5.0 | 3.0 |
| 5.0 | 3.4 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.7 | 3.2 |
| 4.8 | 3.1 |
| 5.4 | 3.4 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 4.9 | 3.1 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 5.0 | 3.5 |
| 5.1 | 3.8 |
| 4.8 | 3.0 |
| 5.1 | 3.8 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.1 | 2.8 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 5.5 | 2.4 |
| 5.8 | 2.7 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.7 | 3.1 |
| 6.3 | 2.3 |
| 5.6 | 3.0 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 5.8 | 2.7 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 6.4 | 3.2 |
| 6.5 | 3.0 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.0 | 2.2 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.1 | 3.0 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.4 | 2.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.9 | 3.1 |
| 6.7 | 3.1 |
| 6.9 | 3.1 |
| 5.8 | 2.7 |
| 6.8 | 3.2 |
| 6.7 | 3.3 |
| 6.7 | 3.0 |
| 6.3 | 2.5 |
| 6.5 | 3.0 |
| 6.2 | 3.4 |
| 5.9 | 3.0 |
If there’s a common prefix or suffix to some column names, you can
utilise that by using selectio helpers starts_with() or
ends_with(), respectively -
iris %>% select(starts_with("Sepal")) %>%
kable(align = "lccrr",caption = "iris data: columns starts with Sepal") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.4 | 3.9 |
| 5.1 | 3.5 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 4.8 | 3.4 |
| 5.0 | 3.0 |
| 5.0 | 3.4 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.7 | 3.2 |
| 4.8 | 3.1 |
| 5.4 | 3.4 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 4.9 | 3.1 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 5.0 | 3.5 |
| 5.1 | 3.8 |
| 4.8 | 3.0 |
| 5.1 | 3.8 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.1 | 2.8 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 5.5 | 2.4 |
| 5.8 | 2.7 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.7 | 3.1 |
| 6.3 | 2.3 |
| 5.6 | 3.0 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 5.8 | 2.7 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 6.4 | 3.2 |
| 6.5 | 3.0 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.0 | 2.2 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.1 | 3.0 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.4 | 2.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.9 | 3.1 |
| 6.7 | 3.1 |
| 6.9 | 3.1 |
| 5.8 | 2.7 |
| 6.8 | 3.2 |
| 6.7 | 3.3 |
| 6.7 | 3.0 |
| 6.3 | 2.5 |
| 6.5 | 3.0 |
| 6.2 | 3.4 |
| 5.9 | 3.0 |
iris %>% select(ends_with("Length")) %>%
kable(align = "lccrr",caption = "iris data: columns ends with Length") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Petal.Length |
|---|---|
| 5.1 | 1.4 |
| 4.9 | 1.4 |
| 4.7 | 1.3 |
| 4.6 | 1.5 |
| 5.0 | 1.4 |
| 5.4 | 1.7 |
| 4.6 | 1.4 |
| 5.0 | 1.5 |
| 4.4 | 1.4 |
| 4.9 | 1.5 |
| 5.4 | 1.5 |
| 4.8 | 1.6 |
| 4.8 | 1.4 |
| 4.3 | 1.1 |
| 5.8 | 1.2 |
| 5.7 | 1.5 |
| 5.4 | 1.3 |
| 5.1 | 1.4 |
| 5.7 | 1.7 |
| 5.1 | 1.5 |
| 5.4 | 1.7 |
| 5.1 | 1.5 |
| 4.6 | 1.0 |
| 5.1 | 1.7 |
| 4.8 | 1.9 |
| 5.0 | 1.6 |
| 5.0 | 1.6 |
| 5.2 | 1.5 |
| 5.2 | 1.4 |
| 4.7 | 1.6 |
| 4.8 | 1.6 |
| 5.4 | 1.5 |
| 5.2 | 1.5 |
| 5.5 | 1.4 |
| 4.9 | 1.5 |
| 5.0 | 1.2 |
| 5.5 | 1.3 |
| 4.9 | 1.4 |
| 4.4 | 1.3 |
| 5.1 | 1.5 |
| 5.0 | 1.3 |
| 4.5 | 1.3 |
| 4.4 | 1.3 |
| 5.0 | 1.6 |
| 5.1 | 1.9 |
| 4.8 | 1.4 |
| 5.1 | 1.6 |
| 4.6 | 1.4 |
| 5.3 | 1.5 |
| 5.0 | 1.4 |
| 7.0 | 4.7 |
| 6.4 | 4.5 |
| 6.9 | 4.9 |
| 5.5 | 4.0 |
| 6.5 | 4.6 |
| 5.7 | 4.5 |
| 6.3 | 4.7 |
| 4.9 | 3.3 |
| 6.6 | 4.6 |
| 5.2 | 3.9 |
| 5.0 | 3.5 |
| 5.9 | 4.2 |
| 6.0 | 4.0 |
| 6.1 | 4.7 |
| 5.6 | 3.6 |
| 6.7 | 4.4 |
| 5.6 | 4.5 |
| 5.8 | 4.1 |
| 6.2 | 4.5 |
| 5.6 | 3.9 |
| 5.9 | 4.8 |
| 6.1 | 4.0 |
| 6.3 | 4.9 |
| 6.1 | 4.7 |
| 6.4 | 4.3 |
| 6.6 | 4.4 |
| 6.8 | 4.8 |
| 6.7 | 5.0 |
| 6.0 | 4.5 |
| 5.7 | 3.5 |
| 5.5 | 3.8 |
| 5.5 | 3.7 |
| 5.8 | 3.9 |
| 6.0 | 5.1 |
| 5.4 | 4.5 |
| 6.0 | 4.5 |
| 6.7 | 4.7 |
| 6.3 | 4.4 |
| 5.6 | 4.1 |
| 5.5 | 4.0 |
| 5.5 | 4.4 |
| 6.1 | 4.6 |
| 5.8 | 4.0 |
| 5.0 | 3.3 |
| 5.6 | 4.2 |
| 5.7 | 4.2 |
| 5.7 | 4.2 |
| 6.2 | 4.3 |
| 5.1 | 3.0 |
| 5.7 | 4.1 |
| 6.3 | 6.0 |
| 5.8 | 5.1 |
| 7.1 | 5.9 |
| 6.3 | 5.6 |
| 6.5 | 5.8 |
| 7.6 | 6.6 |
| 4.9 | 4.5 |
| 7.3 | 6.3 |
| 6.7 | 5.8 |
| 7.2 | 6.1 |
| 6.5 | 5.1 |
| 6.4 | 5.3 |
| 6.8 | 5.5 |
| 5.7 | 5.0 |
| 5.8 | 5.1 |
| 6.4 | 5.3 |
| 6.5 | 5.5 |
| 7.7 | 6.7 |
| 7.7 | 6.9 |
| 6.0 | 5.0 |
| 6.9 | 5.7 |
| 5.6 | 4.9 |
| 7.7 | 6.7 |
| 6.3 | 4.9 |
| 6.7 | 5.7 |
| 7.2 | 6.0 |
| 6.2 | 4.8 |
| 6.1 | 4.9 |
| 6.4 | 5.6 |
| 7.2 | 5.8 |
| 7.4 | 6.1 |
| 7.9 | 6.4 |
| 6.4 | 5.6 |
| 6.3 | 5.1 |
| 6.1 | 5.6 |
| 7.7 | 6.1 |
| 6.3 | 5.6 |
| 6.4 | 5.5 |
| 6.0 | 4.8 |
| 6.9 | 5.4 |
| 6.7 | 5.6 |
| 6.9 | 5.1 |
| 5.8 | 5.1 |
| 6.8 | 5.9 |
| 6.7 | 5.7 |
| 6.7 | 5.2 |
| 6.3 | 5.0 |
| 6.5 | 5.2 |
| 6.2 | 5.4 |
| 5.9 | 5.1 |
Even, an internal pattern can be used to select a column by using
contains() -
iris %>% select(contains("dth")) %>%
kable(align = "lccrr",caption = "iris data: column names containing 'dth'") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Width | Petal.Width |
|---|---|
| 3.5 | 0.2 |
| 3.0 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.6 | 0.2 |
| 3.9 | 0.4 |
| 3.4 | 0.3 |
| 3.4 | 0.2 |
| 2.9 | 0.2 |
| 3.1 | 0.1 |
| 3.7 | 0.2 |
| 3.4 | 0.2 |
| 3.0 | 0.1 |
| 3.0 | 0.1 |
| 4.0 | 0.2 |
| 4.4 | 0.4 |
| 3.9 | 0.4 |
| 3.5 | 0.3 |
| 3.8 | 0.3 |
| 3.8 | 0.3 |
| 3.4 | 0.2 |
| 3.7 | 0.4 |
| 3.6 | 0.2 |
| 3.3 | 0.5 |
| 3.4 | 0.2 |
| 3.0 | 0.2 |
| 3.4 | 0.4 |
| 3.5 | 0.2 |
| 3.4 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.4 | 0.4 |
| 4.1 | 0.1 |
| 4.2 | 0.2 |
| 3.1 | 0.2 |
| 3.2 | 0.2 |
| 3.5 | 0.2 |
| 3.6 | 0.1 |
| 3.0 | 0.2 |
| 3.4 | 0.2 |
| 3.5 | 0.3 |
| 2.3 | 0.3 |
| 3.2 | 0.2 |
| 3.5 | 0.6 |
| 3.8 | 0.4 |
| 3.0 | 0.3 |
| 3.8 | 0.2 |
| 3.2 | 0.2 |
| 3.7 | 0.2 |
| 3.3 | 0.2 |
| 3.2 | 1.4 |
| 3.2 | 1.5 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 2.8 | 1.5 |
| 2.8 | 1.3 |
| 3.3 | 1.6 |
| 2.4 | 1.0 |
| 2.9 | 1.3 |
| 2.7 | 1.4 |
| 2.0 | 1.0 |
| 3.0 | 1.5 |
| 2.2 | 1.0 |
| 2.9 | 1.4 |
| 2.9 | 1.3 |
| 3.1 | 1.4 |
| 3.0 | 1.5 |
| 2.7 | 1.0 |
| 2.2 | 1.5 |
| 2.5 | 1.1 |
| 3.2 | 1.8 |
| 2.8 | 1.3 |
| 2.5 | 1.5 |
| 2.8 | 1.2 |
| 2.9 | 1.3 |
| 3.0 | 1.4 |
| 2.8 | 1.4 |
| 3.0 | 1.7 |
| 2.9 | 1.5 |
| 2.6 | 1.0 |
| 2.4 | 1.1 |
| 2.4 | 1.0 |
| 2.7 | 1.2 |
| 2.7 | 1.6 |
| 3.0 | 1.5 |
| 3.4 | 1.6 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 3.0 | 1.3 |
| 2.5 | 1.3 |
| 2.6 | 1.2 |
| 3.0 | 1.4 |
| 2.6 | 1.2 |
| 2.3 | 1.0 |
| 2.7 | 1.3 |
| 3.0 | 1.2 |
| 2.9 | 1.3 |
| 2.9 | 1.3 |
| 2.5 | 1.1 |
| 2.8 | 1.3 |
| 3.3 | 2.5 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.9 | 1.8 |
| 3.0 | 2.2 |
| 3.0 | 2.1 |
| 2.5 | 1.7 |
| 2.9 | 1.8 |
| 2.5 | 1.8 |
| 3.6 | 2.5 |
| 3.2 | 2.0 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.5 | 2.0 |
| 2.8 | 2.4 |
| 3.2 | 2.3 |
| 3.0 | 1.8 |
| 3.8 | 2.2 |
| 2.6 | 2.3 |
| 2.2 | 1.5 |
| 3.2 | 2.3 |
| 2.8 | 2.0 |
| 2.8 | 2.0 |
| 2.7 | 1.8 |
| 3.3 | 2.1 |
| 3.2 | 1.8 |
| 2.8 | 1.8 |
| 3.0 | 1.8 |
| 2.8 | 2.1 |
| 3.0 | 1.6 |
| 2.8 | 1.9 |
| 3.8 | 2.0 |
| 2.8 | 2.2 |
| 2.8 | 1.5 |
| 2.6 | 1.4 |
| 3.0 | 2.3 |
| 3.4 | 2.4 |
| 3.1 | 1.8 |
| 3.0 | 1.8 |
| 3.1 | 2.1 |
| 3.1 | 2.4 |
| 3.1 | 2.3 |
| 2.7 | 1.9 |
| 3.2 | 2.3 |
| 3.3 | 2.5 |
| 3.0 | 2.3 |
| 2.5 | 1.9 |
| 3.0 | 2.0 |
| 3.4 | 2.3 |
| 3.0 | 1.8 |
Even, you can use regular expression to select a column by using
matches() -
# column name containing either W or d or both
iris %>% select(matches("[Wd]")) %>%
kable(align = "lccrr",caption = "iris data: column name containing W or d") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Width | Petal.Width |
|---|---|
| 3.5 | 0.2 |
| 3.0 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.6 | 0.2 |
| 3.9 | 0.4 |
| 3.4 | 0.3 |
| 3.4 | 0.2 |
| 2.9 | 0.2 |
| 3.1 | 0.1 |
| 3.7 | 0.2 |
| 3.4 | 0.2 |
| 3.0 | 0.1 |
| 3.0 | 0.1 |
| 4.0 | 0.2 |
| 4.4 | 0.4 |
| 3.9 | 0.4 |
| 3.5 | 0.3 |
| 3.8 | 0.3 |
| 3.8 | 0.3 |
| 3.4 | 0.2 |
| 3.7 | 0.4 |
| 3.6 | 0.2 |
| 3.3 | 0.5 |
| 3.4 | 0.2 |
| 3.0 | 0.2 |
| 3.4 | 0.4 |
| 3.5 | 0.2 |
| 3.4 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.4 | 0.4 |
| 4.1 | 0.1 |
| 4.2 | 0.2 |
| 3.1 | 0.2 |
| 3.2 | 0.2 |
| 3.5 | 0.2 |
| 3.6 | 0.1 |
| 3.0 | 0.2 |
| 3.4 | 0.2 |
| 3.5 | 0.3 |
| 2.3 | 0.3 |
| 3.2 | 0.2 |
| 3.5 | 0.6 |
| 3.8 | 0.4 |
| 3.0 | 0.3 |
| 3.8 | 0.2 |
| 3.2 | 0.2 |
| 3.7 | 0.2 |
| 3.3 | 0.2 |
| 3.2 | 1.4 |
| 3.2 | 1.5 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 2.8 | 1.5 |
| 2.8 | 1.3 |
| 3.3 | 1.6 |
| 2.4 | 1.0 |
| 2.9 | 1.3 |
| 2.7 | 1.4 |
| 2.0 | 1.0 |
| 3.0 | 1.5 |
| 2.2 | 1.0 |
| 2.9 | 1.4 |
| 2.9 | 1.3 |
| 3.1 | 1.4 |
| 3.0 | 1.5 |
| 2.7 | 1.0 |
| 2.2 | 1.5 |
| 2.5 | 1.1 |
| 3.2 | 1.8 |
| 2.8 | 1.3 |
| 2.5 | 1.5 |
| 2.8 | 1.2 |
| 2.9 | 1.3 |
| 3.0 | 1.4 |
| 2.8 | 1.4 |
| 3.0 | 1.7 |
| 2.9 | 1.5 |
| 2.6 | 1.0 |
| 2.4 | 1.1 |
| 2.4 | 1.0 |
| 2.7 | 1.2 |
| 2.7 | 1.6 |
| 3.0 | 1.5 |
| 3.4 | 1.6 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 3.0 | 1.3 |
| 2.5 | 1.3 |
| 2.6 | 1.2 |
| 3.0 | 1.4 |
| 2.6 | 1.2 |
| 2.3 | 1.0 |
| 2.7 | 1.3 |
| 3.0 | 1.2 |
| 2.9 | 1.3 |
| 2.9 | 1.3 |
| 2.5 | 1.1 |
| 2.8 | 1.3 |
| 3.3 | 2.5 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.9 | 1.8 |
| 3.0 | 2.2 |
| 3.0 | 2.1 |
| 2.5 | 1.7 |
| 2.9 | 1.8 |
| 2.5 | 1.8 |
| 3.6 | 2.5 |
| 3.2 | 2.0 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.5 | 2.0 |
| 2.8 | 2.4 |
| 3.2 | 2.3 |
| 3.0 | 1.8 |
| 3.8 | 2.2 |
| 2.6 | 2.3 |
| 2.2 | 1.5 |
| 3.2 | 2.3 |
| 2.8 | 2.0 |
| 2.8 | 2.0 |
| 2.7 | 1.8 |
| 3.3 | 2.1 |
| 3.2 | 1.8 |
| 2.8 | 1.8 |
| 3.0 | 1.8 |
| 2.8 | 2.1 |
| 3.0 | 1.6 |
| 2.8 | 1.9 |
| 3.8 | 2.0 |
| 2.8 | 2.2 |
| 2.8 | 1.5 |
| 2.6 | 1.4 |
| 3.0 | 2.3 |
| 3.4 | 2.4 |
| 3.1 | 1.8 |
| 3.0 | 1.8 |
| 3.1 | 2.1 |
| 3.1 | 2.4 |
| 3.1 | 2.3 |
| 2.7 | 1.9 |
| 3.2 | 2.3 |
| 3.3 | 2.5 |
| 3.0 | 2.3 |
| 2.5 | 1.9 |
| 3.0 | 2.0 |
| 3.4 | 2.3 |
| 3.0 | 1.8 |
The filter() verb is used to subset a data frame based
on one or more conditions. Only the rows that satisfy the condition(s)
remain and others filter out. There are some functions and operators
that you should know while dealing with filter() verb -
==, > or <,
>= or <= &,
|, ! is.na()
%in%
Let’s see some examples -
# choose the rows whose Petal.Width is greater than 2
iris %>% filter(Petal.Width > 2) %>%
kable(align = "lccrr",caption = "iris data: Petal width creater than 2") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
# choose the rows for setosa Species
iris %>% filter(Species == "setosa") %>%
kable(align = "lccrr",caption = "iris data: setosa only") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
#or
iris %>% filter(Species %in% "setosa") %>%
kable(align = "lccrr",caption = "iris data: setosa only") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
# or even the opposite is True
iris %>% filter(Species != "setosa") %>%
kable(align = "lccrr",caption = "iris data: without setosa") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
The verb mutate() creates new columns and often the
element of the new col can be functions of the existing variables
(i.e. columns).
iris %>% mutate(Length_difference = Sepal.Length - Petal.Length) %>% # not that the new column here make much sense
kable(align = "lccrr",caption = "iris data: new column added") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Length_difference |
|---|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 3.7 |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 3.5 |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 3.4 |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 3.1 |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 3.6 |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 3.7 |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa | 3.2 |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa | 3.5 |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa | 3.0 |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa | 3.4 |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa | 3.9 |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa | 3.2 |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa | 3.4 |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa | 3.2 |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa | 4.6 |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa | 4.2 |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa | 4.1 |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa | 3.7 |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa | 4.0 |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa | 3.6 |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa | 3.7 |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa | 3.6 |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa | 3.6 |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa | 3.4 |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa | 2.9 |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa | 3.4 |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa | 3.4 |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa | 3.7 |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa | 3.8 |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa | 3.1 |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa | 3.2 |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa | 3.9 |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa | 3.7 |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa | 4.1 |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa | 3.4 |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa | 3.8 |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa | 4.2 |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa | 3.5 |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa | 3.1 |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa | 3.6 |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa | 3.7 |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa | 3.2 |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa | 3.1 |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa | 3.4 |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa | 3.2 |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa | 3.4 |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa | 3.5 |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa | 3.2 |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa | 3.8 |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa | 3.6 |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | 2.3 |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | 1.9 |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | 2.0 |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor | 1.5 |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor | 1.9 |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor | 1.2 |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor | 1.6 |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor | 1.6 |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor | 2.0 |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor | 1.3 |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor | 1.5 |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor | 1.7 |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor | 2.0 |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor | 1.4 |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor | 2.0 |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor | 2.3 |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor | 1.1 |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor | 1.7 |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor | 1.7 |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor | 1.7 |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor | 1.1 |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor | 2.1 |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor | 1.4 |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor | 1.4 |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor | 2.1 |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor | 2.2 |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor | 2.0 |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor | 1.7 |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor | 1.5 |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor | 2.2 |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor | 1.7 |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor | 1.8 |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor | 1.9 |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor | 0.9 |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor | 0.9 |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor | 1.5 |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor | 2.0 |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor | 1.9 |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor | 1.5 |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor | 1.5 |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor | 1.1 |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor | 1.5 |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor | 1.8 |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor | 1.7 |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor | 1.4 |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor | 1.5 |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor | 1.5 |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor | 1.9 |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor | 2.1 |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor | 1.6 |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica | 0.3 |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica | 0.7 |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica | 1.2 |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica | 0.7 |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica | 0.7 |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica | 1.0 |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica | 0.4 |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica | 1.0 |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica | 0.9 |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica | 1.1 |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica | 1.4 |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica | 1.1 |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica | 1.3 |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica | 0.7 |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica | 0.7 |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica | 1.1 |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica | 1.0 |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica | 1.0 |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica | 0.8 |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica | 1.0 |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica | 1.2 |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica | 0.7 |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica | 1.0 |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica | 1.4 |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica | 1.0 |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica | 1.2 |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica | 1.4 |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica | 1.2 |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica | 0.8 |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica | 1.4 |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica | 1.3 |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica | 1.5 |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica | 0.8 |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica | 1.2 |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica | 0.5 |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica | 1.6 |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica | 0.7 |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica | 0.9 |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica | 1.2 |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica | 1.5 |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica | 1.1 |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica | 1.8 |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica | 0.7 |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica | 0.9 |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica | 1.0 |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica | 1.5 |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica | 1.3 |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica | 1.3 |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica | 0.8 |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica | 0.8 |
# To keep only the newly created column, use transmute()
iris %>% transmute(Length_difference = Sepal.Length - Petal.Length) %>%
kable(align = "lccrr",caption = "iris data: new column only") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Length_difference |
|---|
| 3.7 |
| 3.5 |
| 3.4 |
| 3.1 |
| 3.6 |
| 3.7 |
| 3.2 |
| 3.5 |
| 3.0 |
| 3.4 |
| 3.9 |
| 3.2 |
| 3.4 |
| 3.2 |
| 4.6 |
| 4.2 |
| 4.1 |
| 3.7 |
| 4.0 |
| 3.6 |
| 3.7 |
| 3.6 |
| 3.6 |
| 3.4 |
| 2.9 |
| 3.4 |
| 3.4 |
| 3.7 |
| 3.8 |
| 3.1 |
| 3.2 |
| 3.9 |
| 3.7 |
| 4.1 |
| 3.4 |
| 3.8 |
| 4.2 |
| 3.5 |
| 3.1 |
| 3.6 |
| 3.7 |
| 3.2 |
| 3.1 |
| 3.4 |
| 3.2 |
| 3.4 |
| 3.5 |
| 3.2 |
| 3.8 |
| 3.6 |
| 2.3 |
| 1.9 |
| 2.0 |
| 1.5 |
| 1.9 |
| 1.2 |
| 1.6 |
| 1.6 |
| 2.0 |
| 1.3 |
| 1.5 |
| 1.7 |
| 2.0 |
| 1.4 |
| 2.0 |
| 2.3 |
| 1.1 |
| 1.7 |
| 1.7 |
| 1.7 |
| 1.1 |
| 2.1 |
| 1.4 |
| 1.4 |
| 2.1 |
| 2.2 |
| 2.0 |
| 1.7 |
| 1.5 |
| 2.2 |
| 1.7 |
| 1.8 |
| 1.9 |
| 0.9 |
| 0.9 |
| 1.5 |
| 2.0 |
| 1.9 |
| 1.5 |
| 1.5 |
| 1.1 |
| 1.5 |
| 1.8 |
| 1.7 |
| 1.4 |
| 1.5 |
| 1.5 |
| 1.9 |
| 2.1 |
| 1.6 |
| 0.3 |
| 0.7 |
| 1.2 |
| 0.7 |
| 0.7 |
| 1.0 |
| 0.4 |
| 1.0 |
| 0.9 |
| 1.1 |
| 1.4 |
| 1.1 |
| 1.3 |
| 0.7 |
| 0.7 |
| 1.1 |
| 1.0 |
| 1.0 |
| 0.8 |
| 1.0 |
| 1.2 |
| 0.7 |
| 1.0 |
| 1.4 |
| 1.0 |
| 1.2 |
| 1.4 |
| 1.2 |
| 0.8 |
| 1.4 |
| 1.3 |
| 1.5 |
| 0.8 |
| 1.2 |
| 0.5 |
| 1.6 |
| 0.7 |
| 0.9 |
| 1.2 |
| 1.5 |
| 1.1 |
| 1.8 |
| 0.7 |
| 0.9 |
| 1.0 |
| 1.5 |
| 1.3 |
| 1.3 |
| 0.8 |
| 0.8 |
Interestingly, setting the value of an existing column to
NULL inside mutate deletes the column.
As the name suggests, rename() verb changes the name of
an existing column. The syntax is
<new_name> = <old_name>. Example -
iris %>% rename(Species.name=Species) %>%
kable(align = "lccrr",caption = "iris data: Species column renamed") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species.name |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Interestingly, you can change the name of a column while selecting
using select() verb -
iris %>% select(Sepal.Length,
Sepal.Width,
Petal.Length,
Petal.Width,
Species.name=Species) %>%
kable(align = "lccrr",caption = "iris data: Species column renamed using select()") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species.name |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
The verb arrange() arranges or orders the rows of a data
frame by the values of selected column(s), like -
iris %>% arrange(Sepal.Length) %>%
kable(align = "lccrr",caption = "iris data: arranged by Sepal length") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
# After arranging the data frame by Sepal.Length, for a distinct Sepal.Length, the Sepal.Width is arrange and so as the rest of the data frame with it.
iris %>% arrange(Sepal.Length,Sepal.Width) %>%
kable(align = "lccrr",caption = "iris data: arranged by Sepal length and width") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
The distinct() verb retains only the unique/distinct
rows from a data frame given the column(s) selected and only output the
select column(s), if not the .keep_all parameter is change
from it’s default value FALSE to TRUE. Let’s
see some examples -
iris %>% distinct(Sepal.Length) %>%
kable(align = "lccrr",caption = "iris data: distinct Sepal length") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length |
|---|
| 5.1 |
| 4.9 |
| 4.7 |
| 4.6 |
| 5.0 |
| 5.4 |
| 4.4 |
| 4.8 |
| 4.3 |
| 5.8 |
| 5.7 |
| 5.2 |
| 5.5 |
| 4.5 |
| 5.3 |
| 7.0 |
| 6.4 |
| 6.9 |
| 6.5 |
| 6.3 |
| 6.6 |
| 5.9 |
| 6.0 |
| 6.1 |
| 5.6 |
| 6.7 |
| 6.2 |
| 6.8 |
| 7.1 |
| 7.6 |
| 7.3 |
| 7.2 |
| 7.7 |
| 7.4 |
| 7.9 |
# here only the unique combinations of Sepal.Length and Sepal.Width are kept.
iris %>% distinct(Sepal.Length,Sepal.Width) %>%
kable(align = "lccrr",caption = "iris data: distinct Sepal length and width only") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 5.0 | 3.0 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.8 | 3.1 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.3 | 2.3 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.8 | 3.2 |
| 6.2 | 3.4 |
# rest of the columns are also returned.
iris %>% distinct(Sepal.Length,Sepal.Width, .keep_all = T) %>%
kable(align = "lccrr",caption = "iris data: distinct Sepal length and width only") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
the slice() verb lets you index rows by their (integer)
locations. It has some helpers too -
slice_head() selects the first row, while
slice_tail() selects the last. Same can be done using
slice(1) and slice(n())
slice_head(<int>) selects from the first to the
<int>th row, while
slice_tail(<int>) selects from
<int>th to the last row up to the end row.
slice_sample() selects rows at random
slice_min() and slice_max() helper selects
rows with the lowest and the highest value of the selected variable. Few
examples -
iris %>% slice(1) %>%
kable(align = "lccrr",caption = "iris data: a random row") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
iris %>% slice(10:n()) %>%
kable(align = "lccrr",caption = "iris data: from 10th row to the end") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
iris %>% slice_min( Sepal.Length) %>%
kable(align = "lccrr",caption = "iris data: row with the lowest sepal length") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.3 | 3 | 1.1 | 0.1 | setosa |
A disclaimer, there’s no verb (exactly) called join() in
dplyr (at least, to date). However, there are two types of join verbs -
inner_join() and outer_join (which is also not
a verb, but a class of three verbs: left_join(),
right_join() and full_join()). join verbs
joins columns from two data frames based on a common key column.
inner_join() verb joins two data frame and retains the
rows where the keys match. This means that there is a potential loss of
observations that we may not appreciate in the real-life analysis.
On the other hand, if we have two data frames x and
y, the left_join() verb matches the keys from
x and y, while keeps all the rows from
x and joins the matched rows from y. The empty
cells are filled with NA values. For right_join() verb, is
the opposite scenario. On the other hand, the full_join()
verb retains all the rows from both data frames and empty cells are
filled with NA values. Let’s clear the concept with some examples -
x <- iris %>% select(Sepal.Length,Sepal.Width,Species) %>% filter(Species %in% c("setosa", "versicolor")) %>% slice_sample(n=10)
y <- iris %>% select(Petal.Length,Petal.Width,Species) %>% filter(Species %in% c("versicolor", "virginica")) %>% slice_sample(n=10)
x %>% inner_join(y, by = "Species") %>%
kable(align = "lccrr",caption = "iris data: inner_join") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Warning in inner_join(., y, by = "Species"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 6.0 | 2.7 | versicolor | 4.9 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.6 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.3 | 1.0 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.5 | 1.0 |
| 5.6 | 3.0 | versicolor | 4.9 | 1.5 |
| 5.6 | 3.0 | versicolor | 4.6 | 1.4 |
| 5.6 | 3.0 | versicolor | 3.9 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.3 | 1.0 |
| 5.6 | 3.0 | versicolor | 4.7 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.5 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.9 | 1.5 |
| 6.7 | 3.1 | versicolor | 4.6 | 1.4 |
| 6.7 | 3.1 | versicolor | 3.9 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.3 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.7 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.5 | 1.0 |
| 6.1 | 2.9 | versicolor | 4.9 | 1.5 |
| 6.1 | 2.9 | versicolor | 4.6 | 1.4 |
| 6.1 | 2.9 | versicolor | 3.9 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.3 | 1.0 |
| 6.1 | 2.9 | versicolor | 4.7 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.5 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.9 | 1.5 |
| 5.2 | 2.7 | versicolor | 4.6 | 1.4 |
| 5.2 | 2.7 | versicolor | 3.9 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.3 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.7 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.5 | 1.0 |
x %>% left_join(y, by = "Species") %>%
kable(align = "lccrr",caption = "iris data: left_join") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Warning in left_join(., y, by = "Species"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 5.1 | 3.8 | setosa | NA | NA |
| 6.0 | 2.7 | versicolor | 4.9 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.6 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.3 | 1.0 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.5 | 1.0 |
| 4.5 | 2.3 | setosa | NA | NA |
| 4.6 | 3.6 | setosa | NA | NA |
| 5.6 | 3.0 | versicolor | 4.9 | 1.5 |
| 5.6 | 3.0 | versicolor | 4.6 | 1.4 |
| 5.6 | 3.0 | versicolor | 3.9 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.3 | 1.0 |
| 5.6 | 3.0 | versicolor | 4.7 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.5 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.9 | 1.5 |
| 6.7 | 3.1 | versicolor | 4.6 | 1.4 |
| 6.7 | 3.1 | versicolor | 3.9 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.3 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.7 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.5 | 1.0 |
| 5.4 | 3.4 | setosa | NA | NA |
| 6.1 | 2.9 | versicolor | 4.9 | 1.5 |
| 6.1 | 2.9 | versicolor | 4.6 | 1.4 |
| 6.1 | 2.9 | versicolor | 3.9 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.3 | 1.0 |
| 6.1 | 2.9 | versicolor | 4.7 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.5 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.9 | 1.5 |
| 5.2 | 2.7 | versicolor | 4.6 | 1.4 |
| 5.2 | 2.7 | versicolor | 3.9 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.3 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.7 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.5 | 1.0 |
| 4.8 | 3.0 | setosa | NA | NA |
x %>% right_join(y, by = "Species") %>%
kable(align = "lccrr",caption = "iris data: right_join") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Warning in right_join(., y, by = "Species"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 6.0 | 2.7 | versicolor | 4.9 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.6 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.3 | 1.0 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.5 | 1.0 |
| 5.6 | 3.0 | versicolor | 4.9 | 1.5 |
| 5.6 | 3.0 | versicolor | 4.6 | 1.4 |
| 5.6 | 3.0 | versicolor | 3.9 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.3 | 1.0 |
| 5.6 | 3.0 | versicolor | 4.7 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.5 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.9 | 1.5 |
| 6.7 | 3.1 | versicolor | 4.6 | 1.4 |
| 6.7 | 3.1 | versicolor | 3.9 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.3 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.7 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.5 | 1.0 |
| 6.1 | 2.9 | versicolor | 4.9 | 1.5 |
| 6.1 | 2.9 | versicolor | 4.6 | 1.4 |
| 6.1 | 2.9 | versicolor | 3.9 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.3 | 1.0 |
| 6.1 | 2.9 | versicolor | 4.7 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.5 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.9 | 1.5 |
| 5.2 | 2.7 | versicolor | 4.6 | 1.4 |
| 5.2 | 2.7 | versicolor | 3.9 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.3 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.7 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.5 | 1.0 |
| NA | NA | virginica | 5.9 | 2.1 |
| NA | NA | virginica | 5.6 | 2.4 |
| NA | NA | virginica | 6.6 | 2.1 |
| NA | NA | virginica | 5.6 | 1.8 |
x %>% full_join(y, by = "Species") %>%
kable(align = "lccrr",caption = "iris data: full_join") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Warning in full_join(., y, by = "Species"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 2 of `x` matches multiple rows in `y`.
## ℹ Row 2 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 5.1 | 3.8 | setosa | NA | NA |
| 6.0 | 2.7 | versicolor | 4.9 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.6 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.3 | 1.0 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.2 |
| 6.0 | 2.7 | versicolor | 3.5 | 1.0 |
| 4.5 | 2.3 | setosa | NA | NA |
| 4.6 | 3.6 | setosa | NA | NA |
| 5.6 | 3.0 | versicolor | 4.9 | 1.5 |
| 5.6 | 3.0 | versicolor | 4.6 | 1.4 |
| 5.6 | 3.0 | versicolor | 3.9 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.3 | 1.0 |
| 5.6 | 3.0 | versicolor | 4.7 | 1.2 |
| 5.6 | 3.0 | versicolor | 3.5 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.9 | 1.5 |
| 6.7 | 3.1 | versicolor | 4.6 | 1.4 |
| 6.7 | 3.1 | versicolor | 3.9 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.3 | 1.0 |
| 6.7 | 3.1 | versicolor | 4.7 | 1.2 |
| 6.7 | 3.1 | versicolor | 3.5 | 1.0 |
| 5.4 | 3.4 | setosa | NA | NA |
| 6.1 | 2.9 | versicolor | 4.9 | 1.5 |
| 6.1 | 2.9 | versicolor | 4.6 | 1.4 |
| 6.1 | 2.9 | versicolor | 3.9 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.3 | 1.0 |
| 6.1 | 2.9 | versicolor | 4.7 | 1.2 |
| 6.1 | 2.9 | versicolor | 3.5 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.9 | 1.5 |
| 5.2 | 2.7 | versicolor | 4.6 | 1.4 |
| 5.2 | 2.7 | versicolor | 3.9 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.3 | 1.0 |
| 5.2 | 2.7 | versicolor | 4.7 | 1.2 |
| 5.2 | 2.7 | versicolor | 3.5 | 1.0 |
| 4.8 | 3.0 | setosa | NA | NA |
| NA | NA | virginica | 5.9 | 2.1 |
| NA | NA | virginica | 5.6 | 2.4 |
| NA | NA | virginica | 6.6 | 2.1 |
| NA | NA | virginica | 5.6 | 1.8 |
I will be describing group_by() and
summarise() verbs together to show the effect of the
former. group_by() is the most importsnt grouping verb in
dplyr. It takes one or more variables of the data frame to group by
-
iris %>% group_by(Species) %>%
kable(align = "lccrr",caption = "iris data: group_by Species") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Rather than some metadata, you don’t see any change in the structure of the iris data frame yet. Let’s select Sepal.Length and see the effect -
iris %>% group_by(Species) %>% select(Sepal.Length) %>%
kable(align = "lccrr",caption = "iris data: group by Species and selected by Sepal length") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Adding missing grouping variables: `Species`
| Species | Sepal.Length |
|---|---|
| setosa | 5.1 |
| setosa | 4.9 |
| setosa | 4.7 |
| setosa | 4.6 |
| setosa | 5.0 |
| setosa | 5.4 |
| setosa | 4.6 |
| setosa | 5.0 |
| setosa | 4.4 |
| setosa | 4.9 |
| setosa | 5.4 |
| setosa | 4.8 |
| setosa | 4.8 |
| setosa | 4.3 |
| setosa | 5.8 |
| setosa | 5.7 |
| setosa | 5.4 |
| setosa | 5.1 |
| setosa | 5.7 |
| setosa | 5.1 |
| setosa | 5.4 |
| setosa | 5.1 |
| setosa | 4.6 |
| setosa | 5.1 |
| setosa | 4.8 |
| setosa | 5.0 |
| setosa | 5.0 |
| setosa | 5.2 |
| setosa | 5.2 |
| setosa | 4.7 |
| setosa | 4.8 |
| setosa | 5.4 |
| setosa | 5.2 |
| setosa | 5.5 |
| setosa | 4.9 |
| setosa | 5.0 |
| setosa | 5.5 |
| setosa | 4.9 |
| setosa | 4.4 |
| setosa | 5.1 |
| setosa | 5.0 |
| setosa | 4.5 |
| setosa | 4.4 |
| setosa | 5.0 |
| setosa | 5.1 |
| setosa | 4.8 |
| setosa | 5.1 |
| setosa | 4.6 |
| setosa | 5.3 |
| setosa | 5.0 |
| versicolor | 7.0 |
| versicolor | 6.4 |
| versicolor | 6.9 |
| versicolor | 5.5 |
| versicolor | 6.5 |
| versicolor | 5.7 |
| versicolor | 6.3 |
| versicolor | 4.9 |
| versicolor | 6.6 |
| versicolor | 5.2 |
| versicolor | 5.0 |
| versicolor | 5.9 |
| versicolor | 6.0 |
| versicolor | 6.1 |
| versicolor | 5.6 |
| versicolor | 6.7 |
| versicolor | 5.6 |
| versicolor | 5.8 |
| versicolor | 6.2 |
| versicolor | 5.6 |
| versicolor | 5.9 |
| versicolor | 6.1 |
| versicolor | 6.3 |
| versicolor | 6.1 |
| versicolor | 6.4 |
| versicolor | 6.6 |
| versicolor | 6.8 |
| versicolor | 6.7 |
| versicolor | 6.0 |
| versicolor | 5.7 |
| versicolor | 5.5 |
| versicolor | 5.5 |
| versicolor | 5.8 |
| versicolor | 6.0 |
| versicolor | 5.4 |
| versicolor | 6.0 |
| versicolor | 6.7 |
| versicolor | 6.3 |
| versicolor | 5.6 |
| versicolor | 5.5 |
| versicolor | 5.5 |
| versicolor | 6.1 |
| versicolor | 5.8 |
| versicolor | 5.0 |
| versicolor | 5.6 |
| versicolor | 5.7 |
| versicolor | 5.7 |
| versicolor | 6.2 |
| versicolor | 5.1 |
| versicolor | 5.7 |
| virginica | 6.3 |
| virginica | 5.8 |
| virginica | 7.1 |
| virginica | 6.3 |
| virginica | 6.5 |
| virginica | 7.6 |
| virginica | 4.9 |
| virginica | 7.3 |
| virginica | 6.7 |
| virginica | 7.2 |
| virginica | 6.5 |
| virginica | 6.4 |
| virginica | 6.8 |
| virginica | 5.7 |
| virginica | 5.8 |
| virginica | 6.4 |
| virginica | 6.5 |
| virginica | 7.7 |
| virginica | 7.7 |
| virginica | 6.0 |
| virginica | 6.9 |
| virginica | 5.6 |
| virginica | 7.7 |
| virginica | 6.3 |
| virginica | 6.7 |
| virginica | 7.2 |
| virginica | 6.2 |
| virginica | 6.1 |
| virginica | 6.4 |
| virginica | 7.2 |
| virginica | 7.4 |
| virginica | 7.9 |
| virginica | 6.4 |
| virginica | 6.3 |
| virginica | 6.1 |
| virginica | 7.7 |
| virginica | 6.3 |
| virginica | 6.4 |
| virginica | 6.0 |
| virginica | 6.9 |
| virginica | 6.7 |
| virginica | 6.9 |
| virginica | 5.8 |
| virginica | 6.8 |
| virginica | 6.7 |
| virginica | 6.7 |
| virginica | 6.3 |
| virginica | 6.5 |
| virginica | 6.2 |
| virginica | 5.9 |
Though I selected only the Sepal.Length, the Species
column also appears. Yes, that’s because of the application
group_by() verb beforehand. But the most dramatic effect
can be seen in conjunction with the summarise() verb.
summarise() generates a new data frame and returns one
row (with the result of course) for each combination of grouping
variables. In the case of no grouping variables, the output has a single
row summarising all observations in the input. Now, let’s see the effect
of group_by() in conjunction with summarise()
verb -
iris %>%
group_by(Species) %>%
select(Sepal.Length) %>%
summarise(count=n()) %>%
kable(align = "lccrr",caption = "iris data: summarised count by Species") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Adding missing grouping variables: `Species`
| Species | count |
|---|---|
| setosa | 50 |
| versicolor | 50 |
| virginica | 50 |
iris %>%
group_by(Species) %>%
select(Sepal.Length) %>%
summarise(mean_Sepal_length=mean(Sepal.Length)) %>%
kable(align = "lccrr",caption = "iris data: Summarised mean Sepal length by Species") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
## Adding missing grouping variables: `Species`
| Species | mean_Sepal_length |
|---|---|
| setosa | 5.006 |
| versicolor | 5.936 |
| virginica | 6.588 |
# as being told, without any grouping -
iris %>%
select(Sepal.Length) %>%
summarise(mean_Sepal_length=mean(Sepal.Length)) %>%
kable(align = "lccrr",caption = "iris data: summarised mean Sepal length without grouping") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "200px")
| mean_Sepal_length |
|---|
| 5.843333 |
ggplot2ggplot2To my opinion, the most elegant package for data visualisation in
R is ggplot2. Here, gg stands for the
grammar of graphics. Put aside what you have learnt so far on
basic R plotting techniques, ggplot2 defines the
art of plotting in a whole new way. The learning curve may be steep, but
once you learn it, you will fall in love with it (I promise). You
provide the data, tell ggplot2 which variables to map to
the aesthetics, and tell what do you want. ggplot2 will
take care of the rest.
The easiest way to get ggplot2 is to install the whole
tidyverse:
#install.packages("tidyverse")
Alternatively, install just ggplot2:
#install.packages("ggplot2")
Or the the development version from GitHub:
#install.packages("devtools")
#devtools::install_github("tidyverse/ggplot2")
And then, load it …
library(ggplot2)
In this chapter, I will bwe using the mtcars dataset for
plotting different graphs. For refreshing your memory, let’s have a look
at the dataset -
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Now, I will draw scatter plot using the base R
plot() function, and then using ggplot2, I
will show you the difference -
plot(x=mtcars$mpg, y=mtcars$wt)
ggplot(data = mtcars, mapping = aes(x=mpg,y=wt)) + geom_point()
You can see the stark difference between them -
For plotting with ggplot2, you start with
ggplot() function and you privide the data. You then put
the parameters you need to plot, like - the aesthetic mapping using
mapping = aes(). Then, you add on layers (like
geom_point()), scale (like
scale_x_continuous()), faceting specifications (like
facet_wrap()), coordinate systems (like
coord_flip())
In short, these are the elements that you might see in a block of
graph using ggplot() function -
data
aesthetic mapping
geometric objects
statistical transformations
scales
coordinate systems
position adjustments
faceting
You can specify different layers of the plot and combine using “+”
operator. Now I will dive into different aspects of the
ggplot() function -
aes()Here aesthetic means something that you can see. It is mainly the mapping between a visual attribute and a variable. These are some important aesthetics -
position (x,y)
colour (basically the colour of the outer rim of the object)
fill (the filling-colour/inside-colour of the object)
shape (mainly of point)
line type
size etc
You can read all about them on your Rstudio help panel by typing -
help.search("geom_", package = "ggplot2")
geom_There are so many geomes in ggplot2, like
-
geom_point()
geom_lines()
geom_boxplot()
Again, you can find the geoms by typing in -
help.search("geom_", package = "ggplot2")
Now time to check what I have just mentioned, but before that (as
usual) let’s check the data that we are going to use. I will switch to
another dataset, called mpg, from R.
?mpg
I will now draw a scatter plot using highway miles per gallon as a function of engine displacement (in litres) -
ggplot(data=mpg, aes(x=displ, y=hwy)) + geom_point()
Interestingly, you can save the whole or part of the code snippet in a variable -
# can be saved in a vector first, then print it. Like -
p1 <- ggplot(data=mpg, aes(x=displ, y=hwy)) + geom_point()
p1
# or
p <- ggplot(data=mpg, aes(x=displ, y=hwy)) # saved as a base plot variable. I will call p and add different layer on it.
p2 <- p + geom_point()
p3 <- p + geom_line()
p4 <- p + geom_smooth()
p5 <- p2 + geom_smooth(se = F, linetype="dashed")
p5
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Now let’s play with colour and size -
p + geom_point(colour="red", alpha = 0.2, size = 3) # outside aes(), affects the same for all
p + geom_point(aes(colour=year, shape=factor(cyl)), size = 3) # inside aes(), affects accordingly
You can play with title and axis labels -
p +
geom_point(aes(colour=year), size = 3, alpha = 0.2) +
#geom_text(aes(label=model)) + # may be not a good idea now.
labs(
title = "Fuel efficiency vs Engine displacement",
subtitle = "Fuel efficiency decreases with the engine size",
caption = "Two-seater is an exception",
x = "Engine displacement (L)",
y = "Highway fule economy (mpg)",
colour = "Manufactrure year"
)
If your datapoints are a bit tightly spaced, you can jitter a bit -
p +
geom_point(aes(colour=class), size = 3, position = "jitter") # introducing jitter here. For controlling the amount of movements, you can use geom_jitter()
Let’s play with some scaling -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
scale_x_continuous(name = "x-axis label changed", breaks = seq(0,10,by=5),limits = c(0,10)) +
scale_y_continuous(trans = "reverse")
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
scale_colour_brewer(palette = "Set1") # scale_colour is a widely used one
You can play with the positioning of the legend, too -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
theme(legend.position = "left")
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
theme(legend.position = "none")
#### Coordinate system
I will discuss it with box polt later in this chapter.
If you have too many data points, the idea of faceting is to sub-setting the plot by an appropriate variable -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
facet_wrap(~ class, ncol = 2)
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
facet_grid(~ class) # if there were any blank plot, won't be plotted here
There are different themes to play with -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
theme_void()
By default, the bar plot comes as stacked. If you fill it by a variable that is not used to plot the bars, you can see what I mean. However, for playing with the bar plot, I will be using another dataset called ‘diamonds’ that comes with R.
To begin with -
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut))
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=cut))
But -
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity))
The position is adjusted by the position argument which takes in three options - “identity”, “fill”, and “dodge”
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "identity")
Here, each object falls exactly where it should be in the context of the plot and seems to be overlapped. It can be a little better if you use fill = NA or use alpha value
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "identity", alpha = 0.2)
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, colour=clarity), position = "identity", fill=NA) # mind the change of colour and fill
Position fill catches up all the space vertically for each bar and displays as fraction of the values
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "fill")
But what we usually mean by the bar plots is the next -
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "dodge")
Box plot is very convenient to see the distribution of your data and compare side by side the distributions of different variables in your data -
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
coord_flip()
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
coord_polar()
# Please don't plot boxplot in this way in real-life.
The link: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
Correlation
The following plots help to examine how well correlated two variables are.
Scatterplot
The most frequently used plot for data analysis is undoubtedly the scatterplot. Whenever you want to understand the nature of relationship between two variables, invariably the first choice is the scatterplot.
It can be drawn using geom_point(). Additionally,
geom_smooth which draws a smoothing line (based on loess) by default,
can be tweaked to draw the line of best fit by setting method=‘lm’.
# install.packages("ggplot2")
# load package and data
options(scipen=999) # turn-off scientific notation like 1e+48
library(ggplot2)
theme_set(theme_bw()) # pre-set the bw theme.
data("midwest", package = "ggplot2")
# midwest <- read.csv("http://goo.gl/G1K41K") # bkup data source
# Scatterplot
gg <- ggplot(midwest, aes(x=area, y=poptotal)) +
geom_point(aes(col=state, size=popdensity)) +
geom_smooth(method="loess", se=F) +
xlim(c(0, 0.1)) +
ylim(c(0, 500000)) +
labs(subtitle="Area Vs Population",
y="Population",
x="Area",
title="Scatterplot",
caption = "Source: midwest")
plot(gg)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 15 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
Now it’s our turn to apply the techniques that we have learned so far in this workshop. In this section, we will explore some datasets that were part of a study characterising the genomic mutations (SNVs and CNAs) and gene expression profiles for over 2000 primary breast tumours. In addition, a detailed clinical information can also be found for this study alongside the experimental data from cBioPortal. The study was published under two prominent publications -
Curtis et al., Nature 486:346-52, 2012
Pereira et al., Nature Communications 7:11479, 2016
FYI, the gene expression data generated using microarrays, genome-wide copy number profiles were obtained using SNP microarrays and targeted sequencing was performed using a panel of 40 driver-mutation genes to detect mutations (single nucleotide variants).
Let’s download the data and save it in the workshop2
folder. We will be plotting different aspects of the patient related
information and biological aspect for the sake of exploratory data
analysis (EDA). And for that, we will have to merge and format the
data provided. Now, let’s load the data one by one using the function
read.delim with appropriate parameters -
library(dplyr)
library(ggplot2)
# Load patient data and explore a few of the columns (e.g. BREAST_SURGERY, CELLULARITY,CHEMOTHERAPY, ER_IHC ) -
patient_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_clinical_patient.txt",comment.char = "#", sep = "\t")
patient_data %>% pull(BREAST_SURGERY) %>% table
## .
## BREAST CONSERVING MASTECTOMY
## 554 785 1170
patient_data %>% pull(CELLULARITY) %>% table
## .
## High Low Moderate
## 592 965 215 737
patient_data %>% pull(CHEMOTHERAPY) %>% table
## .
## NO YES
## 529 1568 412
patient_data %>% pull(ER_IHC) %>% table
## .
## Negative Positve
## 83 609 1817
# Load sample data and explore the ER_STATUS
sample_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_clinical_sample.txt",comment.char = "#", sep = "\t")
sample_data %>% pull(ER_STATUS) %>% table
## .
## Negative Positive
## 644 1825
# Load CNA data and explore
CNA_data <- read.table("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_cna.txt",header = T, sep = "\t") %>%
select(-Entrez_Gene_Id) %>%
distinct(Hugo_Symbol, .keep_all = T)
CNA_data[1:10, 1:10]
## Hugo_Symbol MB.0000 MB.0039 MB.0045 MB.0046 MB.0048 MB.0050 MB.0053 MB.0062
## 1 A1BG 0 0 -1 0 0 0 0 -1
## 2 A1BG-AS1 0 0 -1 0 0 0 0 -1
## 3 A1CF 0 0 0 0 1 0 0 0
## 4 A2M 0 0 -1 -1 0 0 0 2
## 5 A2M-AS1 0 0 -1 -1 0 0 0 2
## 6 A2ML1 0 0 -1 -1 0 0 0 2
## 7 A2MP1 0 0 -1 -1 0 0 0 2
## 8 A3GALT2 0 0 0 0 0 0 0 -1
## 9 A4GALT 0 0 0 -1 -1 -1 0 1
## 10 A4GNT 0 0 2 0 0 0 1 1
## MB.0064
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
# Load mutation data and explore
mutation_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_mutations.txt",comment.char = "#", sep = "\t")
mutation_data %>% head()
## Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
## 1 TP53 NA METABRIC GRCh37 17 7579344
## 2 TP53 NA METABRIC GRCh37 17 7579346
## 3 MLLT4 NA METABRIC GRCh37 6 168299111
## 4 NF2 NA METABRIC GRCh37 22 29999995
## 5 SF3B1 NA METABRIC GRCh37 2 198288682
## 6 NT5E NA METABRIC GRCh37 6 86195125
## End_Position Strand Consequence Variant_Classification
## 1 7579345 + frameshift_variant Frame_Shift_Ins
## 2 7579347 + protein_altering_variant In_Frame_Ins
## 3 168299111 + missense_variant Missense_Mutation
## 4 29999995 + missense_variant Missense_Mutation
## 5 198288682 + synonymous_variant Silent
## 6 86195125 + synonymous_variant Silent
## Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS
## 1 INS - - G NA
## 2 INS - - CAG NA
## 3 SNP G G T NA
## 4 SNP G G T NA
## 5 SNP A A T NA
## 6 SNP T T C NA
## dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode
## 1 NA MTS-T0058 NA
## 2 NA MTS-T0058 NA
## 3 NA MTS-T0058 NA
## 4 NA MTS-T0058 NA
## 5 NA MTS-T0059 NA
## 6 NA MTS-T0059 NA
## Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Tumor_Validation_Allele2 Match_Norm_Validation_Allele1
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## Match_Norm_Validation_Allele2 Verification_Status Validation_Status
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count
## 1 NA Illumina HiSeq 2,000 NA NA NA NA
## 2 NA Illumina HiSeq 2,000 NA NA NA NA
## 3 NA Illumina HiSeq 2,000 NA NA NA NA
## 4 NA Illumina HiSeq 2,000 NA NA NA NA
## 5 NA Illumina HiSeq 2,000 NA NA NA NA
## 6 NA Illumina HiSeq 2,000 NA NA NA NA
## HGVSc HGVSp HGVSp_Short
## 1 ENST00000269305.4:c.343dup p.His115ProfsTer34 p.H115Pfs*34
## 2 ENST00000269305.4:c.340_341insCTG p.Leu114delinsSerVal p.L114delinsSV
## 3 ENST00000392108.3:c.1544G>T p.Gly515Val p.G515V
## 4 ENST00000338641.4:c.8G>T p.Gly3Val p.G3V
## 5 ENST00000335508.6:c.45T>A p.Ile15= p.I15=
## 6 ENST00000257770.3:c.924T>C p.Ile308= p.I308=
## Transcript_ID RefSeq Protein_position Codons Hotspot
## 1 ENST00000269305 NM_001126112.2 114 -/C 0
## 2 ENST00000269305 NM_001126112.2 114 ttg/tCTGtg 0
## 3 ENST00000392108 NM_001040000.2 515 gGa/gTa 0
## 4 ENST00000338641 NM_000268.3 3 gGg/gTg 0
## 5 ENST00000335508 NM_012433.2 15 atT/atA 0
## 6 ENST00000257770 NM_002526.3 308 atT/atC 0
# Load expression data and explore
expression_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_mrna_agilent_microarray.txt",comment.char = "#", sep = "\t", header = T)
expression_data[1:10, 1:10]
## Hugo_Symbol Entrez_Gene_Id MB.0362 MB.0346 MB.0386 MB.0574 MB.0185
## 1 RERE 473 8.676978 9.653589 9.033589 8.814855 8.736406
## 2 RNF165 494470 6.075331 6.687887 5.910885 5.628740 6.392422
## 3 PHF7 51533 5.838270 5.600876 6.030718 5.849428 5.542133
## 4 CIDEA 1149 6.397503 5.246319 10.111816 6.116868 5.184098
## 5 TENT2 167153 7.906217 8.267256 7.959291 9.206376 8.162845
## 6 SLC17A3 10786 5.702379 5.521794 5.689533 5.439130 5.464326
## 7 SDS 10993 6.930741 6.141689 6.529312 6.430102 6.105427
## 8 ATP6V1C2 245973 5.332863 7.563477 5.482155 5.398675 5.026018
## 9 F3 2152 5.275676 5.376381 5.463788 5.409761 5.338580
## 10 FAM71C 196472 5.443896 5.319857 5.254294 5.512298 5.430874
## MB.0503 MB.0641 MB.0201
## 1 9.274265 9.286585 8.437347
## 2 5.908698 6.206729 6.095592
## 3 5.964661 5.783374 5.737572
## 4 7.828171 8.744149 5.480091
## 5 8.706646 8.518929 7.478413
## 6 5.417484 5.629885 5.686286
## 7 6.684893 5.632753 5.866132
## 8 5.266674 5.701353 6.403136
## 9 5.490693 5.363266 6.341856
## 10 5.363378 5.191612 5.208379
To begin with, let’s explore the mutation data a bit by
plotting the frequency of different types of mutations -
head(mutation_data)
## Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
## 1 TP53 NA METABRIC GRCh37 17 7579344
## 2 TP53 NA METABRIC GRCh37 17 7579346
## 3 MLLT4 NA METABRIC GRCh37 6 168299111
## 4 NF2 NA METABRIC GRCh37 22 29999995
## 5 SF3B1 NA METABRIC GRCh37 2 198288682
## 6 NT5E NA METABRIC GRCh37 6 86195125
## End_Position Strand Consequence Variant_Classification
## 1 7579345 + frameshift_variant Frame_Shift_Ins
## 2 7579347 + protein_altering_variant In_Frame_Ins
## 3 168299111 + missense_variant Missense_Mutation
## 4 29999995 + missense_variant Missense_Mutation
## 5 198288682 + synonymous_variant Silent
## 6 86195125 + synonymous_variant Silent
## Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS
## 1 INS - - G NA
## 2 INS - - CAG NA
## 3 SNP G G T NA
## 4 SNP G G T NA
## 5 SNP A A T NA
## 6 SNP T T C NA
## dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode
## 1 NA MTS-T0058 NA
## 2 NA MTS-T0058 NA
## 3 NA MTS-T0058 NA
## 4 NA MTS-T0058 NA
## 5 NA MTS-T0059 NA
## 6 NA MTS-T0059 NA
## Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Tumor_Validation_Allele2 Match_Norm_Validation_Allele1
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## Match_Norm_Validation_Allele2 Verification_Status Validation_Status
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count
## 1 NA Illumina HiSeq 2,000 NA NA NA NA
## 2 NA Illumina HiSeq 2,000 NA NA NA NA
## 3 NA Illumina HiSeq 2,000 NA NA NA NA
## 4 NA Illumina HiSeq 2,000 NA NA NA NA
## 5 NA Illumina HiSeq 2,000 NA NA NA NA
## 6 NA Illumina HiSeq 2,000 NA NA NA NA
## HGVSc HGVSp HGVSp_Short
## 1 ENST00000269305.4:c.343dup p.His115ProfsTer34 p.H115Pfs*34
## 2 ENST00000269305.4:c.340_341insCTG p.Leu114delinsSerVal p.L114delinsSV
## 3 ENST00000392108.3:c.1544G>T p.Gly515Val p.G515V
## 4 ENST00000338641.4:c.8G>T p.Gly3Val p.G3V
## 5 ENST00000335508.6:c.45T>A p.Ile15= p.I15=
## 6 ENST00000257770.3:c.924T>C p.Ile308= p.I308=
## Transcript_ID RefSeq Protein_position Codons Hotspot
## 1 ENST00000269305 NM_001126112.2 114 -/C 0
## 2 ENST00000269305 NM_001126112.2 114 ttg/tCTGtg 0
## 3 ENST00000392108 NM_001040000.2 515 gGa/gTa 0
## 4 ENST00000338641 NM_000268.3 3 gGg/gTg 0
## 5 ENST00000335508 NM_012433.2 15 atT/atA 0
## 6 ENST00000257770 NM_002526.3 308 atT/atC 0
ggplot(data=mutation_data,mapping = aes(Variant_Classification, fill=Variant_Classification)) +
geom_bar() +
coord_flip()
Now we will build a word cloud of genes that had been affected by mutations -
# install.packages("wordcloud")
library(wordcloud)
## Loading required package: RColorBrewer
# We need the gene name and how many times they are affected by any non-synonymous mutation -
mutation_wordcloud_data <- mutation_data %>%
filter(Consequence != "synonymous_variant") %>%
group_by(Hugo_Symbol) %>%
summarise(freq=n()) %>%
rename(word=Hugo_Symbol)
# Let's find out some highly affected genes -
ggplot(mutation_wordcloud_data %>% filter(freq > 100)) +
geom_col(aes(word, freq)) +
coord_flip()
# Now create the word cloud
wordcloud(word=mutation_wordcloud_data %>% pull(word),
freq = mutation_wordcloud_data %>% pull(freq),
scale=c(5,0.5), # Set min and max scale
max.words=100, # Set top n words
random.order=FALSE, # Words in decreasing freq
rot.per=0.35, # % of vertical words
use.r.layout=T, # Use C++ collision detection
colors=brewer.pal(8, "Dark2"))
Now, we will subset the loaded data so that we can merge (or join) them together later. We will create new dataset containing -
Frequency of mutations per patient from
mutation_data.
Expression data for selected (but important) genes:
"GATA3","FOXA1","MLPH","ESR1","ERBB2","PGR","TP53","PIK3CA", "AKT1", "PTEN", "PIK3R1", "FOXO3","RB1", "KMT2C", "ARID1A", "NCOR1","CTCF","MAP3K1","NF1","CDH1","TBX3","CBFB","RUNX1", "USP9X","SF3B1"
Sub-setting sample_data using selected columns:
PATIENT_ID, SAMPLE_ID, ER_STATUS, HER2_STATUS, PR_STATUS,GRADE.
Sub-setting patient_data using selected columns:
PATIENT_ID, THREEGENE, AGE_AT_DIAGNOSIS, CELLULARITY, CHEMOTHERAPY, ER_IHC, HORMONE_THERAPY, INTCLUST, NPI, CLAUDIN_SUBTYPE.
And, we will combine all the data based on the
patient_ID to create a master dataset that we will use in
the rest of the worshop.
# Find out the frequency of mutations per patient
mutation_per_patient <- mutation_data %>%
filter(Consequence != "synonymous_variant") %>%
pull(Tumor_Sample_Barcode) %>%
table() %>%
data.frame() %>%
select(patient_ID = ".", Mutation_count=Freq)
# subsetting and formatting the expression data
sub_expression_data <- expression_data %>%
filter(Hugo_Symbol %in% c("GATA3","FOXA1","MLPH","ESR1","ERBB2","PGR","TP53","PIK3CA",
"AKT1", "PTEN", "PIK3R1", "FOXO3","RB1", "KMT2C", "ARID1A",
"NCOR1","CTCF","MAP3K1","NF1","CDH1","TBX3","CBFB","RUNX1",
"USP9X","SF3B1"))
rownames(sub_expression_data) <- sub_expression_data$Hugo_Symbol
sub_expression_data <- sub_expression_data %>%
select(-Hugo_Symbol,-Entrez_Gene_Id) %>%
t() %>%
data.frame() %>%
mutate(patient_ID = rownames(.))
# subsetting the sample_data
sub_sample_data <- sample_data %>%
select(patient_ID = PATIENT_ID,
sample_ID = SAMPLE_ID,
cancer_type = CANCER_TYPE,
cancer_type_detailed = CANCER_TYPE_DETAILED,
ER_status = ER_STATUS,
HER2_status = HER2_STATUS,
PR_status = PR_STATUS,
Neoplasm_Histologic_Grade = GRADE)
# subsetting the patient data
sub_patient_data <- patient_data %>%
select(patient_ID = PATIENT_ID,
Three_gene_classifier_subtype = THREEGENE,
Age_at_diagnosis = AGE_AT_DIAGNOSIS,
Cellularity = CELLULARITY,
Chemotherapy = CHEMOTHERAPY,
ER_status_measured_by_IHC = ER_IHC,
Hormone_therapy = HORMONE_THERAPY,
Integrative_cluster = INTCLUST,
Nottingham_prognostic_index = NPI,
PAM50 = CLAUDIN_SUBTYPE)
# let's combine the dataset
combined_data <- left_join(sub_patient_data,sub_sample_data, by="patient_ID")
combined_data <- left_join(combined_data, mutation_per_patient, by="patient_ID")
combined_data$patient_ID <- gsub("-",".",combined_data$patient_ID) # replace the '-' sign to '.' in the patient_ID column
combined_data <- left_join(combined_data,sub_expression_data, by="patient_ID")
Now, we will generate a scatter plot using the expression data of
Estrogen receptor ESR1 against that of transcription factor
GATA3. Then we will build our understanding of their
co-expression by building a linear model. We will then refine that based
on the ER_status (positive or negative) -
ggplot(data = combined_data) +
geom_point(mapping = aes(x = GATA3, y = ESR1))
## Warning: Removed 529 rows containing missing values (`geom_point()`).
ggplot(data = combined_data %>% na.omit(), aes(x = GATA3, y = ESR1)) +
geom_point() + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(data = combined_data %>% na.omit()) +
geom_point(mapping = aes(x = GATA3, y = ESR1,colour = ER_status))
ggplot(data = combined_data %>% na.omit(), aes(x = GATA3, y = ESR1,colour = ER_status)) +
geom_point() + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
On a different note, GATA3 expression is ususlly high in
Luminal A subtype of breast cancer and also in positive estrogen
receptor (ER+) status (Voduc D et.
al.). Let’s find out if that’s try for this study -
ggplot(combined_data, aes(PAM50, GATA3)) +
geom_boxplot()
## Warning: Removed 529 rows containing non-finite values (`stat_boxplot()`).
ggplot(combined_data %>% na.omit(), aes(ER_status, GATA3)) +
geom_boxplot()
ggplot(combined_data %>% na.omit(), aes(ER_status, GATA3)) +
geom_violin(aes(fill=ER_status))
Now, we will look at the distribution of age of the patients at diagnosis as a function of some selected mutated genes.
mut_gene <- mutation_data %>%
filter(Consequence != "synonymous_variant") %>%
select(gene=Hugo_Symbol,patient_ID=Tumor_Sample_Barcode )
patient_age <- patient_data %>% select(age=AGE_AT_DIAGNOSIS,patient_ID=PATIENT_ID)
plot_data <- left_join(mut_gene,patient_age,by="patient_ID") %>%
filter(gene %in% c("PIK3CA", "TP53", "GATA3", "CDH1", "MAP3K1", "CBFB", "SF3B1")) %>%
mutate(age_cat = case_when(age < 45 ~ "<45",
age >= 45 & age <= 54 ~ "45-54",
age >= 55 & age <= 64 ~ "55-64",
age > 64 ~ ">64",)) %>%
na.omit()
plot_data$age_cat <- factor(plot_data$age_cat, ordered = T, levels = c(">64","55-64","45-54","<45"))
plot_data %>%
group_by(gene,age_cat) %>%
select(gene,age_cat) %>%
summarise(freq=n()) %>%
ggplot() +
geom_col(aes(gene,freq, fill=age_cat), position="fill", colour="black") +
scale_fill_manual(values=c("#568a48","#6fad76","#aac987","#e6ede3")) +
theme_classic()
## `summarise()` has grouped output by 'gene'. You can override using the
## `.groups` argument.
Can we distinguish any pattern from the plot?
Now, we will try to explore patterns of co-mutation and mutual exclusivity in a set of 21 driver genes (so-called Mut-driver genes) -
#install.packages("splitstackshape")
library(splitstackshape)
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
# create a matrix for the combination of the frequency of mutatated genes and each patient
mat <- t(splitstackshape:::charMat(listOfValues = split( mut_gene$gene,mut_gene$patient_ID), fill = 0L))
# set of 21 Mut-driver genes
mat_gene <- c("PIK3CA","AKT1","PTEN","PIK3R1","FOXO3", "RB1", "KMT2C", "ARID1A","NCOR1","CTCF", "TP53", "MAP3K1", "NF1","CDH1","GATA3","TBX3","CBFB","RUNX1","ERBB2","USP9X","SF3B1")
# create an empty matrix
mat_asso <- matrix(data=NA, nrow = length(mat_gene), ncol = length(mat_gene))
colnames(mat_asso) <- mat_gene
rownames(mat_asso) <- mat_gene
# fill in the cells with log odds ratio for each pairwise association test
for(i in mat_gene){
for(j in mat_gene){
mat_asso[i,j] <- fisher.test(mat[i,],mat[j,])$estimate %>% log()
}
}
# get rid of a triangular half of the matirx
mat_asso[upper.tri(mat_asso, diag = T)] <- 0
ggplot(melt(mat_asso), aes(Var1,Var2)) +
geom_tile(aes(fill=value), colour="white") +
scale_fill_gradient2(low = "#7c4d91", high = "#5e8761",mid = "white", limits = c(-2,2)) +
theme_classic() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
axis.line.y = element_blank()) +
coord_flip()
References: https://r4ds.had.co.nz/data-visualisation.html https://ggplot2.tidyverse.org/ https://r4ds.had.co.nz/graphics-for-communication.html http://r-statistics.co/ggplot2-Tutorial-With-R.html http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html https://beanumber.github.io/sds192/lab-ggplot2.html